Skip to main content
Top
Published in: 4OR 4/2021

01-12-2020 | Research Paper

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

Authors: Ja’far Dehghanpour-Sahron, Nezam Mahdavi-Amiri

Published in: 4OR | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Partitioning a given data-set into subsets based on similarity among the data is called clustering. Clustering is a major task in data mining and machine learning having many applications such as text retrieval, pattern recognition, and web mining. Here, we briefly review some clustering related problems (k-means, normalized k-cut, orthogonal non-negative matrix factorization, ONMF, and isoperimetry) and describe their connections. We formulate the relaxed mean version of the isoperimetry problem as an optimization problem with non-negative orthogonal constraints. We first make use of a gradient-based optimization algorithm to solve this kind of a problem, and then apply a post-processing technique to extract a solution of the clustering problem. Also, we propose a simplified approach to improve upon solution of the 2-dimensional clustering problem, using the N-nearest neighbor graph. Inspired by this technique, we apply a multilevel method for clustering a given data-set to reduce the size of the problem by grouping a number of similar vertices. The number is determined based on two values, namely, the maximum and the average of the edge weights of the vertices connected to a selected vertex. In addition, using the connections between ONMF and k-means and between k-means and the isoperimetry problem, we propose an algorithm to solve the ONMF problem. A comparative performance analysis of our approach with other related methods shows outperformance of our approach, in terms of the obtained misclassification error rate and Rand index, on both benchmark and randomly generated problems as well as hard synthetic data-sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Arthur D, Sergi V (2007) K-means++: the advantages of careful seeding. In: SODA ’07: proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035 Arthur D, Sergi V (2007) K-means++: the advantages of careful seeding. In: SODA ’07: proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035
go back to reference Barzilai J, Borwein JM (1988) Two-point step size gradient methods. IMA J Numer Anal 8(1):141–148CrossRef Barzilai J, Borwein JM (1988) Two-point step size gradient methods. IMA J Numer Anal 8(1):141–148CrossRef
go back to reference Daneshgar A, Hajiabolhassan H, Javadi R (2010) On the isoperimetric spectrum of graphs and its approximations. J Comb Theory Ser B 100(4):390–412CrossRef Daneshgar A, Hajiabolhassan H, Javadi R (2010) On the isoperimetric spectrum of graphs and its approximations. J Comb Theory Ser B 100(4):390–412CrossRef
go back to reference Daneshgar A, Javadi R, Razavi SS (2013) Clustering and outlier detection using isoperimetric number of trees. Pattern Recogn 46(12):3371–3382CrossRef Daneshgar A, Javadi R, Razavi SS (2013) Clustering and outlier detection using isoperimetric number of trees. Pattern Recogn 46(12):3371–3382CrossRef
go back to reference Ding C, He X, Simon HD (2005) On the equivalence of non-negative matrix factorization and spectral clustering. In: Proceedings of the SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 606–610 Ding C, He X, Simon HD (2005) On the equivalence of non-negative matrix factorization and spectral clustering. In: Proceedings of the SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 606–610
go back to reference Dhillon IS, Guan Y, Kulis B (2004) A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report No. UTCS TR-04-25, Computer Science Department, University of Texas at Austin Dhillon IS, Guan Y, Kulis B (2004) A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report No. UTCS TR-04-25, Computer Science Department, University of Texas at Austin
go back to reference Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213CrossRef Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213CrossRef
go back to reference Jiang B, Dai YH (2015) A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math Program 153(2):535–575CrossRef Jiang B, Dai YH (2015) A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math Program 153(2):535–575CrossRef
go back to reference Kim J, Park H (2011) Fast non-negative matrix factorization: an active-set-like method and comparisons. SIAM J Sci Comput 33(6):3261–3281CrossRef Kim J, Park H (2011) Fast non-negative matrix factorization: an active-set-like method and comparisons. SIAM J Sci Comput 33(6):3261–3281CrossRef
go back to reference Lancichinetti A, Fortunato S (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118CrossRef Lancichinetti A, Fortunato S (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118CrossRef
go back to reference Li T, Ding C (2006) The relationships among various non-negative matrix factorization methods for clustering. In: Data mining. ICDM’06. Sixth international conference on IEEE, pp 362–371 Li T, Ding C (2006) The relationships among various non-negative matrix factorization methods for clustering. In: Data mining. ICDM’06. Sixth international conference on IEEE, pp 362–371
go back to reference Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Scholkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 849–856 Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Scholkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 849–856
go back to reference Nocedal J, Wright SJ (2006) Numerical optimization. Springer series in operations research and financial engineering, 2nd edn. Springer, New York Nocedal J, Wright SJ (2006) Numerical optimization. Springer series in operations research and financial engineering, 2nd edn. Springer, New York
go back to reference Peng J, Wei Y (2007) Approximating k-means-type clustering via semidefinite programming. SIAM J Optim 18(1):186–205CrossRef Peng J, Wei Y (2007) Approximating k-means-type clustering via semidefinite programming. SIAM J Optim 18(1):186–205CrossRef
go back to reference Pompili F, Gillis N, Absil PA, Glineur F (2014) Two algorithms for orthogonal non-negative matrix factorization with application to clustering. Neurocomputing 141:15–25CrossRef Pompili F, Gillis N, Absil PA, Glineur F (2014) Two algorithms for orthogonal non-negative matrix factorization with application to clustering. Neurocomputing 141:15–25CrossRef
go back to reference Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math Program 142(1–2):397–434CrossRef Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math Program 142(1–2):397–434CrossRef
go back to reference West DB (2001) Introduction to graph theory, vol 2. Prentice-Hall, Upper Saddle River West DB (2001) Introduction to graph theory, vol 2. Prentice-Hall, Upper Saddle River
Metadata
Title
A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization
Authors
Ja’far Dehghanpour-Sahron
Nezam Mahdavi-Amiri
Publication date
01-12-2020
Publisher
Springer Berlin Heidelberg
Published in
4OR / Issue 4/2021
Print ISSN: 1619-4500
Electronic ISSN: 1614-2411
DOI
https://doi.org/10.1007/s10288-020-00445-y

Other articles of this Issue 4/2021

4OR 4/2021 Go to the issue

Premium Partners